FOOD HYGIENE RATING SCHEME - CAMDEN¶

1. Introduction :¶

The Food Hygiene Rating Scheme (FHRS) dataset for Camden offers a rich and comprehensive reservoir of information pertaining to the hygiene and operational aspects of food establishments in the borough. As we embark on this Exploratory Data Analysis (EDA), our goal is to unveil intricate patterns, trends, and noteworthy observations embedded within the dataset. This exploration not only seeks to provide a holistic understanding of the local food scene but also to extract valuable insights that can contribute to informed decision-making, policy formulation, and enhanced consumer awareness.I chose this specific dataset as Camden, a vibrant borough in London known for its diverse culinary landscape, is home to a myriad of food establishments, ranging from traditional cafes and restaurants to contemporary takeaway joints and specialty retailers. The FHRS dataset encapsulates crucial details such as hygiene scores, business types, geographical locations, and other relevant metrics, making it a valuable resource for dissecting the dynamics of the local food industry.At the heart of this exploration lies the recognition that the FHRS dataset is not merely a compilation of raw numbers; rather, it serves as a potent tool for generating actionable insights. The intricate interplay between hygiene scores, business types, and geographical distributions unfolds a narrative that extends beyond immediate statistical interpretations. This narrative, driven by data, holds the potential to contribute significantly to informed decision-making, aid in the formulation of effective policies, and elevate consumer awareness.

2. Research Questions:¶

Reports of mouse droppings, flies on pizzas, and out-of-date food inside Camden’s zero and one-star restaurants, as detailed in a news article (https://www.hamhigh.co.uk/news/21348470.mouse-droppings-flies-pizzas-out-of-date-food---inside-camdens-zero-one-star-restaurants/), are alarming indicators of potential hygiene issues and food safety violations. The presence of such problems in restaurants can pose serious health risks for customers. Recognizing the urgency and importance of addressing these issues, I am prioritizing an analysis to identify various restaurants with both low and high ratings, investigate their geospatial locations, and specifically assess pubs and bars for hygiene and overall ratings. Importantly, there has been no previous analysis conducted on this crucial topic, emphasizing the need for a comprehensive examination to ensure public health and safety.

3. Data Source :¶

The FHRS dataset for Camden was obtained from the UK government website (https://www.data.gov.uk/dataset/55022d4a-b796-46db-a7f7-c4bd800aad9a/food-hygiene-rating-scheme-camden). Selecting a government website lends credibility to the data source, considering the stringent standards and regulations associated with government datasets. My personal interest in foods and restaurants, coupled with my experience as an international student, adds a valuable perspective to the selection process. Camden, being a vibrant culinary hub, presents an ideal setting for analysis. I chose the CSV format for my dataset, which was also available in XML, JSON, and RDF formats. This decision was based on my familiarity and comfort with working with CSV. Additionally, selecting CSV provides versatility for other users who may prefer different formats, ensuring accessibility and ease of use for a broader audience.

Acknowledging the primary disadvantage of the dataset, the prevalence of missing values poses a significant challenge that can impact the integrity of the entire analysis. The potential repercussions of imputing incorrect values in place of the missing ones are substantial, as they have the capacity to alter the perceived image of individual restaurants. This inherent risk underscores the need for a cautious and thoughtful approach to handling missing data, as any inaccuracies introduced during the imputation process could potentially skew the results and mislead interpretations.

In [1]:
%matplotlib inline
In [2]:
#Importing libraries that are used in this project

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import seaborn as sns 
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as pyo
import folium
from folium.plugins import MarkerCluster
import json
from folium.features import GeoJson, GeoJsonPopup

Usage of libraries:¶

​

  1. NumPy (import numpy as np): NumPy is a powerful library for numerical operations in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these elements efficiently. ​ ​
  2. Pandas (import pandas as pd): Pandas is a data manipulation library that provides data structures like DataFrame and Series, making it easy to handle and analyze structured data. It is widely used for tasks such as data cleaning, exploration, and preparation. ​ ​
  3. Matplotlib (import matplotlib.pyplot as plt): Matplotlib is a popular data visualization library in Python. It allows users to create static, animated, and interactive plots, charts, and graphs. It provides a wide variety of customization options to create visually appealing visualizations. ​ ​
  4. Seaborn (import seaborn as sns): Seaborn is built on top of Matplotlib and provides a high-level interface for statistical data visualization. It comes with several built-in themes and color palettes to enhance the aesthetics of plots and simplify the creation of complex visualizations. ​ ​
  5. Folium (import folium): Folium is a Python library used for creating interactive leaflet maps. It leverages the Leaflet.js library and allows users to embed maps with markers, popups, and other features directly into a Jupyter notebook or a standalone HTML file. ​ ​
  6. Plotly Express (import plotly.express as px): Plotly Express is a high-level data visualization library that simplifies the creation of interactive plots and dashboards. It is known for its user-friendly syntax and supports a wide range of chart types, making it suitable for exploratory data analysis and communication of results.
In [3]:
# Reading Chicago Crimes data from a CSV file into a DataFrame
df = pd.read_csv('Food_Hygiene_Rating_Scheme_Camden.csv')
df
Out[3]:
Business Name Address Line 1 Address Line 2 Address Line 3 Postcode Business Type ID Business Type Description Food Hygiene Rating Scheme ID Food Hygiene Rating Scheme Type Hygiene Score ... Ward Code Ward Name Easting Northing Longitude Latitude Spatial Accuracy Last Uploaded Location Organisation URI
0 SUSHI DAILY NaN 246 High Holborn NaN WC1V 7EX 4613 Retailers - other 1653945 FHRS NaN ... NaN NaN NaN NaN NaN NaN Unknown 26/11/2023 NaN http://opendatacommunities.org/id/london-borou...
1 DELICIOUSLY ELLA NaN 250 Tottenham Court Road NaN W1T 7QZ 4613 Retailers - other 1567445 FHRS 0.0 ... NaN NaN NaN NaN NaN NaN Unknown 26/11/2023 NaN http://opendatacommunities.org/id/london-borou...
2 KINGS CROSS P BUILDING CAFE Meta, Kings Cross P Building 12 Lewis Cubitt Square NaN N1C 4DR 1 Restaurant/Cafe/Canteen 1453210 FHRS 0.0 ... NaN NaN NaN NaN NaN NaN Unknown 26/11/2023 NaN http://opendatacommunities.org/id/london-borou...
3 SRILANKAN FOOD NaN Chalton Street Market NaN NW1 1JH 4613 Retailers - other 1616574 FHRS NaN ... NaN NaN NaN NaN NaN NaN Unknown 26/11/2023 NaN http://opendatacommunities.org/id/london-borou...
4 FISH PLAICE NaN 32 Museum Street NaN WC1A 1LH 7844 Takeaway/sandwich shop 1492507 FHRS 5.0 ... E05013653 Bloomsbury 530118.0 181556.0 -0.126070 51.517940 Unknown 26/11/2023 (51.51794, -0.12607) http://opendatacommunities.org/id/london-borou...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3829 Fruit and Vegetables Corner of Oxford Arms NaN Islip Street NaN NW5 2DJ 4613 Retailers - other 424726 FHRS 0.0 ... E05013664 Kentish Town South 529100.0 185029.0 -0.139447 51.549386 Unknown 26/11/2023 (51.549386, -0.139447) http://opendatacommunities.org/id/london-borou...
3830 FUNKY CHIPS Unit 705, The Stables Market Chalk Farm Road NaN NW1 8AH 7844 Takeaway/sandwich shop 1374077 FHRS 5.0 ... E05013655 Camden Town 528546.0 184231.0 -0.147738 51.542340 Unknown 26/11/2023 (51.54234, -0.147738) http://opendatacommunities.org/id/london-borou...
3831 CANDY VIBES FOR YOU LTD NaN NaN NaN NaN 4613 Retailers - other 1527880 FHRS 5.0 ... NaN NaN NaN NaN NaN NaN Unknown 26/11/2023 NaN http://opendatacommunities.org/id/london-borou...
3832 EGGLA NaN 44 Chalk Farm Road NaN NW1 8AJ 4613 Retailers - other 1380775 FHRS 10.0 ... E05013660 Haverstock 528485.0 184302.0 -0.148591 51.542992 Unknown 26/11/2023 (51.542992, -0.148591) http://opendatacommunities.org/id/london-borou...
3833 The Convenience Store NaN 63 St Giles High Street NaN WC2H 8LE 4613 Retailers - other 952255 FHRS 10.0 ... E05013662 Holborn and Covent Garden 530013.0 181279.0 -0.127684 51.515475 Unknown 26/11/2023 (51.515475, -0.127684) http://opendatacommunities.org/id/london-borou...

3834 rows × 30 columns

In [4]:
#Displays the columns of the dataset
df.columns
Out[4]:
Index(['Business Name', 'Address Line 1', 'Address Line 2', 'Address Line 3',
       'Postcode', 'Business Type ID', 'Business Type Description',
       'Food Hygiene Rating Scheme ID', 'Food Hygiene Rating Scheme Type',
       'Hygiene Score', 'Structural Score', 'Confidence In Management Score',
       'Rating Value', 'Rating Date', 'New Rating Pending',
       'Local Authority Business ID', 'Local Authority Code',
       'Local Authority Name', 'Local Authority Email Address',
       'Local Authority Website', 'Ward Code', 'Ward Name', 'Easting',
       'Northing', 'Longitude', 'Latitude', 'Spatial Accuracy',
       'Last Uploaded', 'Location', 'Organisation URI'],
      dtype='object')
In [5]:
#Displays number of rows and columns in the dataset
df.shape
Out[5]:
(3834, 30)
In [6]:
# Identifying the numeric columns
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
numeric_df = df.select_dtypes(include=numerics)
len(numeric_df.columns)
Out[6]:
12
In [7]:
#This generates the descriptive statistics for each numerical column
df.describe()
Out[7]:
Address Line 3 Business Type ID Food Hygiene Rating Scheme ID Hygiene Score Structural Score Confidence In Management Score Local Authority Business ID Local Authority Code Easting Northing Longitude Latitude
count 0.0 3834.000000 3.834000e+03 3291.000000 3291.000000 3291.000000 3834.000000 3834.0 3422.000000 3422.000000 3422.000000 3422.000000
mean NaN 4024.980438 1.047931e+06 4.539654 5.504406 5.721665 139453.720918 506.0 528954.768556 183316.972238 -0.142184 51.534033
std NaN 3543.641317 4.563708e+05 4.416657 4.286924 4.855387 78711.930386 0.0 1629.282072 1494.711161 0.023094 0.013699
min NaN 1.000000 4.236590e+05 0.000000 0.000000 0.000000 16.000000 506.0 523976.000000 180938.000000 -0.213138 51.512406
25% NaN 1.000000 4.264245e+05 0.000000 5.000000 0.000000 60467.750000 506.0 528426.750000 181873.000000 -0.148897 51.520718
50% NaN 4613.000000 1.140803e+06 5.000000 5.000000 5.000000 195852.000000 506.0 529261.000000 183438.500000 -0.137934 51.535036
75% NaN 7843.750000 1.453214e+06 5.000000 10.000000 10.000000 200783.250000 506.0 530145.000000 184516.250000 -0.125506 51.545314
max NaN 7846.000000 1.676120e+06 25.000000 25.000000 30.000000 203623.000000 506.0 532012.000000 187472.000000 -0.097915 51.571527
In [8]:
#displays columns of dataset and their types
df.dtypes
Out[8]:
Business Name                       object
Address Line 1                      object
Address Line 2                      object
Address Line 3                     float64
Postcode                            object
Business Type ID                     int64
Business Type Description           object
Food Hygiene Rating Scheme ID        int64
Food Hygiene Rating Scheme Type     object
Hygiene Score                      float64
Structural Score                   float64
Confidence In Management Score     float64
Rating Value                        object
Rating Date                         object
New Rating Pending                    bool
Local Authority Business ID          int64
Local Authority Code                 int64
Local Authority Name                object
Local Authority Email Address       object
Local Authority Website             object
Ward Code                           object
Ward Name                           object
Easting                            float64
Northing                           float64
Longitude                          float64
Latitude                           float64
Spatial Accuracy                    object
Last Uploaded                       object
Location                            object
Organisation URI                    object
dtype: object
In [ ]:
 

4 Data Cleaning¶

4.1 Identifying missing values¶

In [9]:
#Used for checking null values and returns the result as boolean
df.isna()
Out[9]:
Business Name Address Line 1 Address Line 2 Address Line 3 Postcode Business Type ID Business Type Description Food Hygiene Rating Scheme ID Food Hygiene Rating Scheme Type Hygiene Score ... Ward Code Ward Name Easting Northing Longitude Latitude Spatial Accuracy Last Uploaded Location Organisation URI
0 False True False True False False False False False True ... True True True True True True False False True False
1 False True False True False False False False False False ... True True True True True True False False True False
2 False False False True False False False False False False ... True True True True True True False False True False
3 False True False True False False False False False True ... True True True True True True False False True False
4 False True False True False False False False False False ... False False False False False False False False False False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3829 False True False True False False False False False False ... False False False False False False False False False False
3830 False False False True False False False False False False ... False False False False False False False False False False
3831 False True True True True False False False False False ... True True True True True True False False True False
3832 False True False True False False False False False False ... False False False False False False False False False False
3833 False True False True False False False False False False ... False False False False False False False False False False

3834 rows × 30 columns

4.2 Count of missing values¶

In [10]:
# Calculating the number of missing values for each column and sorting in descending order
miss_values = df.isna().sum().sort_values(ascending=False) 
miss_values
Out[10]:
Address Line 3                     3834
Address Line 1                     2641
Hygiene Score                       543
Structural Score                    543
Confidence In Management Score      543
Rating Date                         461
Ward Name                           422
Ward Code                           422
Latitude                            412
Longitude                           412
Location                            412
Northing                            412
Easting                             412
Address Line 2                      122
Postcode                            121
Local Authority Website               0
Spatial Accuracy                      0
Last Uploaded                         0
Business Name                         0
Local Authority Business ID           0
Local Authority Email Address         0
Local Authority Name                  0
Local Authority Code                  0
New Rating Pending                    0
Rating Value                          0
Food Hygiene Rating Scheme Type       0
Food Hygiene Rating Scheme ID         0
Business Type Description             0
Business Type ID                      0
Organisation URI                      0
dtype: int64
In [11]:
miss_values.plot(kind='barh',color='red')
Out[11]:
<Axes: >
In [12]:
df.isna().sum().sort_values(ascending=False)
Out[12]:
Address Line 3                     3834
Address Line 1                     2641
Hygiene Score                       543
Structural Score                    543
Confidence In Management Score      543
Rating Date                         461
Ward Name                           422
Ward Code                           422
Latitude                            412
Longitude                           412
Location                            412
Northing                            412
Easting                             412
Address Line 2                      122
Postcode                            121
Local Authority Website               0
Spatial Accuracy                      0
Last Uploaded                         0
Business Name                         0
Local Authority Business ID           0
Local Authority Email Address         0
Local Authority Name                  0
Local Authority Code                  0
New Rating Pending                    0
Rating Value                          0
Food Hygiene Rating Scheme Type       0
Food Hygiene Rating Scheme ID         0
Business Type Description             0
Business Type ID                      0
Organisation URI                      0
dtype: int64

4.3 removing duplicates¶

In [13]:
#Removing duplicate columns in the dataset
df.drop_duplicates(inplace=True)

4.4 filling missing values¶

In [14]:
# missing values in numeric columns are filled with '0'.
numeric_columns = ['Address Line 3', 'Business Type ID', 'Food Hygiene Rating Scheme ID','Hygiene Score',
                   'Structural Score', 'Confidence In Management Score',
       'Local Authority Business ID', 'Local Authority Code', 'Easting',
       'Northing', 'Longitude', 'Latitude','Location']
for col in numeric_columns:
    df[col]=df[col].fillna(0)
In [15]:
# missing values in string columns are filled with 'unkown'.
string_columns = ['Business Name', 'Address Line 1', 'Address Line 2', 'Postcode',
       'Business Type Description', 'Food Hygiene Rating Scheme Type',
       'Rating Value', 'Rating Date', 'Local Authority Name',
       'Local Authority Email Address', 'Local Authority Website', 'Ward Code', 'Spatial Accuracy', 'Last Uploaded',
       'Organisation URI']
for col in string_columns:
    df[col]=df[col].fillna("unknown")
In [16]:
df['Ward Name'] = df['Ward Name'].bfill()
In [17]:
df.isna().sum().sort_values(ascending=False)
Out[17]:
Business Name                      0
Address Line 1                     0
Location                           0
Last Uploaded                      0
Spatial Accuracy                   0
Latitude                           0
Longitude                          0
Northing                           0
Easting                            0
Ward Name                          0
Ward Code                          0
Local Authority Website            0
Local Authority Email Address      0
Local Authority Name               0
Local Authority Code               0
Local Authority Business ID        0
New Rating Pending                 0
Rating Date                        0
Rating Value                       0
Confidence In Management Score     0
Structural Score                   0
Hygiene Score                      0
Food Hygiene Rating Scheme Type    0
Food Hygiene Rating Scheme ID      0
Business Type Description          0
Business Type ID                   0
Postcode                           0
Address Line 3                     0
Address Line 2                     0
Organisation URI                   0
dtype: int64

5. Exploratory Data Analysis¶

5.1 Bar chart for Business Type Distribution:¶

This visualization offers a rapid overview of the various business types present in the dataset along with their respective counts. It proves useful for comprehending the composition and diversity of businesses in the dataset, enabling us to select a specific category for more in-depth analysis.

In [18]:
# Assuming df is your DataFrame
business_type_counts = df['Business Type Description'].value_counts()

# Create a figure and axis
fig, ax = plt.subplots(figsize=(12, 6))

# Bar Plot
business_type_counts.plot(kind='bar', color='coral', ax=ax)
ax.set_title('Business Type Distribution')
ax.set_xlabel('Business Type')
ax.set_ylabel('Count')

# Table
table_data = pd.DataFrame({'Business Type': business_type_counts.index, 'Count': business_type_counts.values})
table = ax.table(cellText=table_data.values, colLabels=table_data.columns, cellLoc='center', loc='bottom', bbox=[0, -1.20, 1, 0.5])

# Adjust table font size
table.auto_set_font_size(False)
table.set_fontsize(10)

plt.show()

The bar chart illustrates that the 'Business/Cafe/Canteen' category exhibits the highest count, establishing itself as the predominant business type in the dataset. Following closely is the Takeaway/Sandwich shop, with Retailer-other trailing behind.

5.2 Interactive barplot for Hygiene scores by Business Type¶

This Plotly Express visualization presents a dynamic exploration of the top hygiene scores across different business types in the Food Hygiene Rating Scheme (FHRS) dataset. Leveraging the power of interactive plotting, the script identifies and highlights the top 5 hygiene scores for each unique 'Hygiene Score' category within the dataset. The color-coded bars, differentiated by business types, provide a visually engaging representation of the distribution of top hygiene scores, offering valuable insights into the performance of different food establishments in Camden.

In [19]:
import plotly.express as px

# Assuming your DataFrame is named df
# Get the top 5 values for each 'Hygiene Score'
top5_values = df.groupby('Hygiene Score').head(5)

# Create an interactive bar plot using Plotly
fig = px.bar(top5_values, x='Business Type Description', y='Hygiene Score', color='Business Type Description',
             title='Top 5 Hygiene Scores by Business Type', labels={'Hygiene Score': 'Hygiene Score'},)

# Show the plot
fig.show()

Indeed, the presented Plotly Express bar chart distinctly illustrates that the 'Restaurant/Cafe/Canteen' business type dominates the top hygiene scores, ranging from 0 to 25. This valuable insight suggests an opportunity for further exploration and in-depth surveys to pinpoint specific restaurants within this category. Identifying the names and locations of these establishments can provide a more granular understanding of their hygiene practices and potentially uncover patterns or trends that contribute to their high scores. Such detailed investigations can pave the way for targeted improvements, best practices dissemination, and enhanced transparency, ultimately fostering a safer and more informed food environment in Camden.

5.3 Heatmap for Restaurant/Cafe/Canteen with Good Hygiene scores¶

We are narrowing our analysis to the 'Restaurant/Cafe/Canteen' category, specifically emphasizing high hygiene scores such as 25 and 20. By identifying the business names associated with these exemplary hygiene scores, our aim is to provide valuable information to consumers, empowering them to make informed choices about dining options that prioritize hygiene. Given the pivotal role of hygiene in ensuring food quality and, consequently, the well-being of individuals, this analysis serves as a crucial resource. Beyond benefiting consumers, it also contributes to enhancing the visibility and appreciation of these restaurants, recognizing and promoting their commitment to maintaining high hygiene standards.

In [20]:
# Filter DataFrame for the specific business type and multiple Hygiene Scores
selected_business_type = 'Restaurant/Cafe/Canteen'
hygiene_scores = [25,20]
filtered_df = df[(df['Business Type Description'] == selected_business_type) & (df['Hygiene Score'].isin(hygiene_scores))]

# Create a pivot table for the heatmap
pivot_df = filtered_df.pivot_table(index='Business Name', columns='Business Type Description', values='Hygiene Score', aggfunc='mean', fill_value=0)

# Heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(pivot_df, annot=True, cmap='viridis', fmt='g')
plt.title(f'Heatmap for {selected_business_type} with Hygiene Scores {", ".join(map(str, hygiene_scores))}')
plt.xlabel('Business Type Description')
plt.ylabel('Business Name')
plt.show()

The heatmap reveals that five restaurants namely BEBEK MANGEL, David's Deli, NE ZHA, Redemption Roasters, TARIM have achieved the exceptional hygiene score of 25, reflecting their steadfast commitment to ensuring the health and well-being of their patrons. These establishments exhibit a noteworthy dedication to maintaining high hygiene standards, a fact corroborated by the thorough assessments conducted by food safety officers. Given Camden's status as a global attraction, drawing visitors from around the world, the availability of hygienic food becomes paramount. The credibility of these hygiene scores, earned through meticulous inspections, positions these restaurants as trustworthy choices for individuals seeking a dining experience prioritizing cleanliness and safety.

5.4 Multivariate Exploration: Business Names, Ward Names, and Postcodes for Hygiene Scores of 25¶

This interactive scatter plot investigates high-scoring restaurants in the 'Restaurant/Cafe/Canteen' category, with a particular emphasis on those with a hygiene score of 25. The dynamic representation of the spatial distribution and clustering of Camden's top-performing restaurants is provided by the chart, which smoothly blends the fields of "Ward Name," "Business Name," and "Postcode." Plotting each marker as a distinct establishment based on postcode allows for better delineation. This multivariate chart offers an easy-to-use interface for exploring the categories and geographic features of these excellent restaurants.

In [21]:
# Assuming your DataFrame is named df
filtered_data = df[(df['Hygiene Score'] == 25) & (df['Business Type Description'] == 'Restaurant/Cafe/Canteen')]

# Create an interactive scatter plot with Plotly Express
fig = px.scatter(filtered_data, 
                 x='Ward Name', 
                 y='Business Name', 
                 color='Postcode',  # Use 'Postcode' for color-coding
                 size=[8]*len(filtered_data),  # Set the size to a constant value (e.g., 10)
                 labels={'Ward Name': 'Ward Name', 'Business Name': 'Business Name'},
                 title='Multivariate Chart for Hygiene Score 25, Restaurant/Cafe/Canteen',
                 template='plotly_dark')

# Show the interactive chart
fig.show()

Beyond merely acknowledging top ratings, our focus extends to identifying the 'Ward Name' and 'Postcode,' allowing us to guide individuals not only to the highest-rated establishments but also to specific locations associated with these exemplary dining experiences. This nuanced approach acknowledges the potential variations in taste and hygiene standards among different branches of the same restaurant, ensuring that patrons receive accurate guidance. Notably, the chart highlights that NE ZHA and TARIM, both outstanding in terms of hygiene, share the same geographical area, Holborn and Covent Garden. This insight underscores the significance of considering location alongside ratings for a more informed dining choice.

5.5 Geospatial Distribution of Highly Rated Restaurants (Hygiene Score 20-25) in Camden:¶

This interactive Folium map showcases the geospatial distribution of distinguished restaurants falling within the 'Restaurant/Cafe/Canteen' category and boasting hygiene scores between 20 and 25. Centered around the UK, with a zoom level optimized for clear visualization, the map employs Marker Clusters to enhance marker grouping and overall map readability. Each marker on the map represents a high-performing restaurant, with information including the business name and associated ward name displayed in a popup. By providing a visual representation of these well-rated establishments across the Camden area, this map serves as a valuable tool for users seeking to explore and make informed dining choices based on both hygiene standards and geographic proximity.

In [35]:
# Filter the data based on the specified criteria
filtered_data = df[(df['Hygiene Score'].between(20, 25)) & (df['Business Type Description'] == 'Restaurant/Cafe/Canteen')]

# Create a Folium map centered around the UK
uk_map = folium.Map(location=[51, 0], zoom_start=8)

# Create a MarkerCluster to group markers for better visualization
marker_cluster = MarkerCluster().add_to(uk_map)

# Add markers for each business
for index, row in filtered_data.iterrows():
    folium.Marker(
        location=[row['Latitude'], row['Longitude']],
        popup=f"{row['Business Name']} - {row['Ward Name']}",
        icon=None,  # You can customize the icon if needed
    ).add_to(marker_cluster)

# Display the Folium map
uk_map
Out[35]:
Make this Notebook Trusted to load map: File -> Trust Notebook

5.6 Sunburst chart for restaurants with Hygiene Score 0:¶

This Sunburst Chart delves into the hygiene landscape of restaurants categorized as 'Restaurant/Cafe/Canteen' in Camden, specifically focusing on establishments with a hygiene score of 0. The chart provides a visually intuitive exploration of the ward-wise distribution and business names of selected restaurants with low hygiene scores.

In [23]:
# Filter the data based on the specified criteria
filtered_data = df[(df['Hygiene Score'] == 0) & (df['Business Type Description'] == 'Restaurant/Cafe/Canteen')]

# Sort the data based on some criteria (e.g., alphabetical order of business name)
sorted_data = filtered_data.sort_values(by='Business Name').head(10)

# Create a sunburst chart using plotly.express
fig = px.sunburst(sorted_data, path=['Ward Name', 'Business Name'], title='Sunburst Chart for Hygiene Score 0, Restaurant/Cafe/Canteen (Top 5)')

# Show the chart
fig.show()

This analysis holds paramount significance as the hygiene score directly impacts human health, making it a crucial aspect for consideration. Customers invest their trust, payment, and well-being in these restaurants, emphasizing the necessity for a high standard of food quality. The insights derived from this analysis not only benefit the public by guiding them towards establishments with better hygiene but also provide an opportunity for restaurants to identify areas for improvement. Additionally, food safety officers can leverage this information for targeted inspections, contributing to overall enhanced food safety standards in the dining establishments of Camden.

5.7 Exploring Pub/Bar/Nightclub Ratings: A Ward-wise Analysis with Seaborn Stripplot:¶

For this analysis, we focus on Pub/Bar/Nightclub establishments, comparing them based on the new column 'Rating Value,' which represents customer ratings. To handle the vast dataset, a Seaborn stripplot is employed. This visualization provides a snapshot of business types, their respective ratings, and their distribution across different wards.

In the resulting stripplot, each dot signifies a specific business, and distinctive colors indicate the corresponding ward. Given the large dataset, we strategically select the top 5-7 values for each rating category. This approach allows us to condense the information and offer a meaningful representation of key ratings for Pub/Bar/Nightclubs. The visualization not only highlights these prominent ratings but also provides a comprehensive overview of their distribution across various wards. This nuanced insight is invaluable for businesses, policymakers, and stakeholders seeking to understand the performance of Pub/Bar/Nightclubs in the specified area.

In [24]:
# Replace infinite values with NaN in the entire DataFrame
df.replace([np.inf, -np.inf], np.nan, inplace=True)

# Convert 'Rating Value' to numeric (if it's not already)
df['Rating Value'] = pd.to_numeric(df['Rating Value'], errors='coerce')

# Filter rows with non-null 'Rating Value' and specific business type
filtered_df = df[(df['Business Type Description'] == 'Pub/bar/nightclub') & df['Rating Value'].notnull()]

# Get the top 5-7 values for each 'Rating Value'
top_values = filtered_df.groupby('Rating Value').head(7)

# Plot using Seaborn with unique color for each 'Ward Name'
plt.figure(figsize=(12, 8))
sns.stripplot(
    data=top_values,
    x='Business Type Description',
    y='Rating Value',
    hue='Ward Name',
    palette='viridis',
    jitter=True,
    dodge=True,
    size=12,  # Adjust the size of each dot
    alpha=0.7
)

# Customize the plot
plt.title('Top 5-7 Ratings for Pub/Bar/Nightclub')
plt.xlabel('Business Type')
plt.ylabel('Rating Value')

# Show the legend outside the plot
plt.legend(title='Ward Name', bbox_to_anchor=(1.05, 1), loc='upper left')

# Display the plot
plt.show()
C:\Users\sanjith\anaconda3\Lib\site-packages\seaborn\_oldcore.py:1119: FutureWarning:

use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.

C:\Users\sanjith\anaconda3\Lib\site-packages\seaborn\_oldcore.py:1119: FutureWarning:

use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.

Exploring the distribution of ratings for Pub/Bar/Nightclub establishments reveals interesting patterns. While 0-rated pubs are concentrated in Bloomsbury and Kings Cross, 5-rated pubs/bars are spread across various locations such as Kings Cross, Regents Park, and Bloomsbury. To conduct a more in-depth analysis, we narrow our focus to establishments with rating values of 0 and 5. By isolating the best and worst-rated pubs, we aim to gain deeper insights into the factors influencing customer perceptions in different areas.

5.8 Exploring Pub/Bar/Nightclub Ratings: A Faceted Analysis of Rating Values 0 and 5 Across Wards:¶

We are individually selecting Rating Values 0 and 5, along with the corresponding ward names where these ratings occur. This information is presented in a FacetGrid to enhance clarity, facilitating a deeper exploration to identify business names associated with these specific ratings.

In [25]:
# Filter the data for 'Pub/bar/nightclub' business type and Rating Values '0' and '5'
pub_data_filtered = df[(df['Business Type Description'] == 'Pub/bar/nightclub') & (df['Rating Value'].isin([0, 5]))]

# Get the top 5 ward names based on count
top_5_wards = pub_data_filtered['Ward Name'].value_counts().nlargest(5).index

# Filter the data for the top 5 ward names
pub_data_filtered_top5 = pub_data_filtered[pub_data_filtered['Ward Name'].isin(top_5_wards)]

# Create a FacetGrid for each Rating Value
g = sns.FacetGrid(pub_data_filtered_top5, col='Rating Value', col_wrap=2, height=5, palette='viridis')

# Map a count plot to each facet
g.map(sns.countplot, 'Ward Name', palette='plasma', order=top_5_wards)

# Customize the charts
g.set_titles("Rating Value {col_name}")
g.set_axis_labels('Ward Name', 'Count')

# Rotate x-axis labels for better readability
g.set_xticklabels(rotation=45, ha='right')

# Display the charts
plt.tight_layout()
plt.show()

# Create tables for each facet
for rating_value in [0, 5]:
    table_data = pub_data_filtered_top5[pub_data_filtered_top5['Rating Value'] == rating_value].pivot_table(
        index='Ward Name',
        values='Business Name',
        aggfunc='count'
    )
    print(f'\nTable for Rating Value {rating_value}:\n{table_data.to_markdown()}')
Table for Rating Value 0:
| Ward Name   |   Business Name |
|:------------|----------------:|
| Bloomsbury  |               1 |
| Kings Cross |               1 |

Table for Rating Value 5:
| Ward Name                 |   Business Name |
|:--------------------------|----------------:|
| Bloomsbury                |              33 |
| Camden Town               |              14 |
| Holborn and Covent Garden |              42 |
| Kings Cross               |              11 |
| Regents Park              |              14 |

The FacetGrid plot offers a comprehensive and visually intuitive overview of Pub/Bar/Nightclub ratings in London, specifically focusing on Rating Values '0' and '5'. This analysis reveals distinct geographic patterns, indicating which wards predominantly house the best-rated ('5') and worst-rated ('0') establishments. Additionally, the count of pubs/bars within each rating category provides a quantitative understanding of the prevalence of these ratings in different areas. This information is crucial for stakeholders, businesses, and policymakers, enabling them to identify specific wards where interventions or improvements may be needed and offering valuable insights for strategic decision-making in the food industry.

5.9 Geospatial Visualization of Pub/Bar/Nightclub Ratings in London from best to worst¶

In this geospatial analysis, our focus is on Pub/Bar/Nightclub establishments in London. The goal is to visualize the distribution of ratings across various locations, providing insights into the performance of these businesses. The map uses color-coded markers to represent different ratings: establishments with a rating of '5' are marked in green, those with ratings from '2-4' in yellow, and those with ratings '0' and '1' in red. This visualization offers a quick overview of the best and worst-rated establishments in specific areas. Explore the map to gain insights into the geographic patterns of these ratings and their potential implications for businesses and consumers.

In [26]:
filtered_data_top5 = df[(df['Business Type Description'] == 'Pub/bar/nightclub') & (df['Rating Value'].isin([0, 1, 5]))]

# Create a base map centered around a location (e.g., Kings Cross)
map_center = [51.5326, -0.1240] 
mymap = folium.Map(location=map_center, zoom_start=14)

# Create a MarkerCluster for better visualization of multiple markers
marker_cluster = MarkerCluster().add_to(mymap)

# Add markers to the map for each business
for index, row in filtered_data_top5.iterrows():
    # Assuming your DataFrame has 'Latitude' and 'Longitude' columns
    lat, lon = row['Latitude'], row['Longitude']
    
    # Create a popup with information
    popup_text = f"Business Name: {row['Business Name']}<br>Rating Value: {row['Rating Value']}"
    
    # Customize the marker color based on Rating Value
    marker_color = 'green' if row['Rating Value'] == 5 else 'red' if row['Rating Value'] in [0, 1] else 'yellow'
    
    # Create a marker with a circle representing the bubble
    folium.CircleMarker(
        location=[lat, lon],
        radius=10,
        color=marker_color,
        fill=True,
        fill_color=marker_color,
        fill_opacity=0.6,
        popup=popup_text
    ).add_to(marker_cluster)

# Customize the map appearance
folium.TileLayer('openstreetmap').add_to(mymap)  # Change basemap to OpenStreetMap
folium.LayerControl().add_to(mymap)  # Add layer control for different basemaps

# Save or display the map
mymap.save('customized_map.html')   
mymap
Out[26]:
Make this Notebook Trusted to load map: File -> Trust Notebook

6.Conclusion :¶

Throughout this project, we encountered and overcame various challenges to derive meaningful insights from Camden's FHRS dataset. A notable difficulty was the presence of missing or incomplete data in fields such as hygiene scores, Rating values, Address, geographical coordinates, etc., hindering a comprehensive understanding of hygiene standards and spatial distribution. To address this, we employed data imputation techniques, minimizing the impact of missing values. Despite these challenges, the analysis provided valuable insights. Business types analysis highlighted the dominance of 'Business/Cafe/Canteen,' while the heatmap showcased top performers like BEBEK MANGEL and NE ZHA, guiding targeted improvements. Geospatial analysis emphasized the need to consider location alongside ratings for informed dining choices, as illustrated in the Folium map. The project's significance lies in promoting public health, guiding consumers to establishments with better hygiene, and aiding food safety officers in targeted inspections. Exploring Pub/Bar/Nightclub ratings unveiled distinct geographic patterns for Rating Values '0' and '5,' offering crucial insights for stakeholders and policymakers. The FacetGrid plot provided a comprehensive overview, identifying wards needing interventions in the food industry. In conclusion, this project, despite its challenges, serves as a valuable resource for fostering a safer and more transparent food environment in Camden, contributing to enhanced food safety standards, informed decision-making, and overall improved well-being for both businesses and consumers.

7.Evaluation:¶

The exploration of Camden's FHRS dataset aimed to provide a comprehensive understanding of the local food scene, extracting valuable insights for informed decision-making and enhanced consumer awareness. Throughout the process, the challenges of missing or incomplete data were acknowledged, prompting the use of data imputation techniques with a mindful approach to minimize potential biases. The analysis successfully revealed patterns in business types, highlighted top performers, and showcased geographic distributions.

An honest reflection recognizes the impact of missing values on the analysis, with an emphasis on the cautious handling of imputation to mitigate risks of misinterpretation. The project's significance lies in its contribution to public health, guiding consumers to establishments with better hygiene, aiding food safety officers, and fostering a safer food environment in Camden.

Future directions could involve refining data collection methods to reduce missing values, exploring advanced imputation techniques, and conducting in-depth investigations into the factors influencing high hygiene scores. Moreover, extending the analysis to encompass a temporal dimension could uncover trends or changes in hygiene standards over time. This project serves as a valuable foundation for ongoing research to continuously improve the understanding of food establishments in Camden and beyond.

In [27]:
%%js

// Run this cell to update your word count.

function wordcount() {
    let wordCount = 0;
    let extraCount = 0;
    let mainBody = true;

    let cells = Jupyter.notebook.get_cells();
    cells.forEach((cell) => {
        if (cell.cell_type == 'markdown') {
            let text = cell.get_text();

            // Stop counting as main body when getting to References or Appendices
            if (text.startsWith('## References') || text.startsWith('## Appendices')) {
                mainBody = false;
            }

            if (text.startsWith('## Word Count')) {
                text = '';
            }

            if (text && mainBody) {
                let words = text.toLowerCase().match(/\b[a-z\d]+\b/g);

                if (words) {
                    let cellCount = words.length;
                    wordCount += cellCount;
                }
            } else if (text) {
                let words = text.toLowerCase().match(/\b[a-z\d]+\b/g);

                if (words) {
                    let cellCount = words.length;
                    extraCount += cellCount;
                }
            }
        }
    });

    return [wordCount, extraCount];
}

let wc = wordcount();
element.append(`Main word count: ${wc[0]} (References and appendices word count: ${wc[1]})`);
In [ ]:
 
In [ ]: